Categories

Versions

You are viewing the RapidMiner Studio documentation for version 10.1 - Check here for latest version

Extract Sentiment (Operator Toolbox)

Synopsis

This operator creates a sentiment score by applying either open source sentiment dictionaries or proprietary API methods on an existing text attribute. There are options to expose additional results depending on the method chosen.

Description

Aylien: Uses the Aylien Sentiment Analysis API (https://aylien.com/text-api/sentiment-analysis/) to score the text. The Aylien ''polarity'' is exposed as the sentiment score. If you use the advanced output option you will also get the subjectivity confidence. Note that you will need to create an Aylien API connection to use this operator and that you are limited to Aylien's rate limiting as described on their pricing page (https://developer.aylien.com/plans).

MeaningCloud: Uses the MeaningCloud Sentiment Analysis API (https://www.meaningcloud.com/developer/sentiment-analysis/) to score the text. Unlike other methods, MeaningCloud does not offering a continuous sentiment score, but a nominal ''score_tag'' with the following possible outputs: P+, P, NEU, N, N+, or missing. This operator calculates a sentiment score via a mapping of these values: P+ = 1, P = 0.75, NEU = 0, N = -0.75 and N+ = -1. Currently only English language is supported. If you use the advanced output option you will also get the unmapped score tag, agreement, subjectivity, confidence for score tag, and irony. Note that you will need to create a MeaningCloud API connection to use this operator and that you are limited to MeaningCloud's rate limiting as described on their pricing page (https://www.meaningcloud.com/products/pricing).

SentiWordNet: Uses the SentiWordNet 3.0 sentiment dictionary (https://sentiwordnet.isti.cnr.it/) to score the text. SentiWordNet produces both a positivity and negativity component score for each word; this operator calculates the difference of the two components to produce the sentiment score for each word, and then exposes the sum of all sentiment word scores in the text. If you use the advanced output option you will also get a nominal attribute with all words taking part in the scoring, the sum of positive components, the sum of negative components, and the the number of used and unused tokens.

  • Reference: Stefano Baccianella, Andrea Esuli, and Fabrizio Sebastiani. SENTIWORDNET 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining
  • Website: https://sentiwordnet.isti.cnr.it/.
  • License: CC BY-SA 3.0

VADER: Uses the VADER (Valence Aware Dictionary and sEntiment Reasoner) lexicon and rule-based sentiment (https://github.com/cjhutto/vaderSentiment) to score the text. VADER is specifically attuned to sentiments expressed in social media and produces scores based on a dictionary of words. This operator calculates and then exposes the sum of all sentiment word scores in the text. If you use the advanced output option you will also get a nominal attribute with all words taking part in the scoring, the sum of positive components, the sum of negative components, and the the number of used and unused tokens.

  • Reference: Hutto, C.J. and Gilbert, E.E. (2014). VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text. Eighth International Conference on Weblogs and Social Media (ICWSM-14). Ann Arbor, MI, June 2014.
  • Website: https://github.com/cjhutto/vaderSentiment
  • License: MIT

VADER (French): Uses french version of the VADER (Valence Aware Dictionary and sEntiment Reasoner) lexicon and rule-based sentiment (https://github.com/cjhutto/vaderSentiment) to score the text.

  • Reference: Hutto, C.J. and Gilbert, E.E. (2014). VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text. Eighth International Conference on Weblogs and Social Media (ICWSM-14). Ann Arbor, MI, June 2014.
  • Website: https://github.com/thomas7lieues/vader_FR
  • License: MIT

VADER (German): Uses German version of the VADER (Valence Aware Dictionary and sEntiment Reasoner) lexicon and rule-based sentiment (https://github.com/cjhutto/vaderSentiment) called GerVADER to score the text.

  • Reference: Karsten Michael Tymann, Matthias Lutz, Patrick Palsbröker and Carsten Gips: GerVADER - A German adaptation of the VADER sentiment analysis tool for social media texts. In Proceedings of the Conference "Lernen, Wissen, Daten, Analysen" (LWDA 2019), Berlin, Germany, September 30 - October 2, 2019.
  • Website: https://github.com/KarstenAMF/GerVADER
  • License: MIT

Input

  • exa (Data Table)

    An ExampleSet with the text attribute to be processed.

Output

  • exa (Data Table)

    The original ExampleSet with a new "Score" attribute (and additional attributes if selected via the advanced parameters).

Parameters

  • model The model used to score the text.
    • aylien: Uses the Aylien Sentiment Analysis API (https://aylien.com/text-api/sentiment-analysis/). Note that you will need to create an Aylien API connection to use this operator and are limited to Aylien's rate limiting as described on their pricing page (https://developer.aylien.com/plans).
    • meaning_cloud: Uses the MeaningCloud Sentiment Analysis API (https://www.meaningcloud.com/developer/sentiment-analysis/). Note that you will need to create a MeaningCloud API connection to use this operator and are limited to MeaningCloud's rate limiting as described on their pricing page (https://www.meaningcloud.com/products/pricing).
    • sentiwordnet: Uses the SentiWordNet 3.0 sentiment dictionary (https://sentiwordnet.isti.cnr.it/). Reference: Stefano Baccianella, Andrea Esuli, and Fabrizio Sebastiani. SENTIWORDNET 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining.
    • vader: Uses the VADER (Valence Aware Dictionary and sEntiment Reasoner) lexicon and rule-based sentiment (https://github.com/cjhutto/vaderSentiment). Reference: Hutto, C.J. and Gilbert, E.E. (2014). VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text. Eighth International Conference on Weblogs and Social Media (ICWSM-14). Ann Arbor, MI, June 2014.
    Range:
  • text_attribute The text attribute you want to score. Range:
  • show_advanced_output If this parameter is set to false, only a Score attribute will be created. These scores are not always normalized and depend on the individual model. If this parameter is set to true, additional attributes will be created. These additional attributes vary depending on which model is chosen. Range:
  • use_default_tokenization_regex If this parameter is set to true, this operator will tokenize the text using the \W regular expression (all non-word characters) for SentiWordNet and VADER methods. If this parameter is set to false, a custom regular expression for tokenization is expected. Range:
  • tokenization_regex The custom tokenization regex used to split the text. Range:
  • additional_words This parameter allows you to add your own words to the scoring. The word you want to add to the dictionary needs to be added to the left hand side, the score of it to the right hand side. Scores are supposed to be real values. All dictionary based methods are scaled in a way, that the maximum absolute score is 1. Thus it is recommended to use weights between -1 and +1. If you need to add many new words it is recommended to use "Dictionary Based Sentiment" operator, which allows you to create your own model. Range:
  • meaningcloud_connection This parameter is only available for MeaningCloud. This parameter is used to connect to the API using a predefined connection. You can have many predefined connections. You can choose one of them using the drop down box. You can add a new connection or modify previous connections using the button next to the drop down box. You may also accomplish this by clicking on the Manage Connections from the Connections menu in the main window. A new window appears. This window will send you to the third-party API provider to create an account and produce an API key. The Test button in this new window will allow you to check whether the connection was made. Save the connection once the test is successful. After saving a new connection, it can be chosen from the drop down box of the connection parameter. Range:
  • aylien_connection This parameter is only available for Aylien. This parameter is used to connect to the API using a predefined connection. You can have many predefined connections. You can choose one of them using the drop down box. You can add a new connection or modify previous connections using the button next to the drop down box. You may also accomplish this by clicking on the Manage Connections from the Connections menu in the main window. A new window appears. This window will send you to the third-party API provider to create an account and produce an API key . The Test button in this new window will allow you to check whether the connection was made. Save the connection once the test is successful. After saving a new connection, it can be chosen from the drop down box of the connection parameter. Range:

Tutorial Processes

Single Text Scoring